Skip to content

Conversation

@TaoChenOSU
Copy link
Contributor

@TaoChenOSU TaoChenOSU commented Feb 6, 2026

Motivation and Context

Closes: #3530, #3529, #1665

Description

  1. Simplify checkpoint encoding and decoding strategies.
    • WorkflowCheckpoint now contains live objects, as opposed to serialized jsons. This makes working with a checkpoint much easier in code.
    • Serialization/deserialization is done when a checkpoint is saved to file or read from a file. This opens up possibilities to support different serialization protocols.
    • FileCheckpointStorage now uses pickle.
    • InMemoryCheckpointStorage now storages raw checkpoints (i.e. no serialization)
  2. Remove workflow_id from checkpoints
  3. Add previous_checkpoint_id to checkpoints
  4. Rename APIs on checkpoint storages
  5. Remove _checkpoint_summary.py and _conversation_state.py
  6. Remove deprecated checkpoint hooks

Contribution Checklist

  • The code builds clean without any errors or warnings
  • The PR follows the Contribution Guidelines
  • All unit tests pass, and I have added new tests where possible
  • Is this a breaking change? If yes, add "[BREAKING]" prefix to the title of the PR.

@TaoChenOSU TaoChenOSU self-assigned this Feb 6, 2026
@TaoChenOSU TaoChenOSU added python workflows Related to Workflows in agent-framework labels Feb 6, 2026
@github-actions github-actions bot changed the title WIP: Checkpoint refactor: encode/decode, checkpoint format, etc Python: WIP: Checkpoint refactor: encode/decode, checkpoint format, etc Feb 6, 2026
@TaoChenOSU TaoChenOSU changed the title Python: WIP: Checkpoint refactor: encode/decode, checkpoint format, etc [BREAKING] Python: WIP: Checkpoint refactor: encode/decode, checkpoint format, etc Feb 6, 2026
@markwallace-microsoft
Copy link
Member

markwallace-microsoft commented Feb 9, 2026

Python Test Coverage

Python Test Coverage Report •
FileStmtsMissCoverMissing
packages/core/agent_framework
   observability.py6098486%336, 338–340, 343–345, 350–351, 357–358, 364–365, 372, 374–376, 379–381, 386–387, 393–394, 400–401, 408, 664, 667, 675–676, 679–682, 684, 687–689, 692–693, 721, 723, 734–736, 738–741, 745, 753, 854, 856, 1005, 1007, 1011–1016, 1018, 1021–1025, 1027, 1139–1140, 1142, 1193–1194, 1329, 1377–1378, 1494–1496, 1555, 1725, 1879, 1881
packages/core/agent_framework/_workflows
   _agent_executor.py1642286%96, 144, 162–163, 217–218, 220–221, 251–253, 261–263, 273–275, 277, 281, 285, 289–290
   _checkpoint.py1450100% 
   _checkpoint_encoding.py470100% 
   _events.py1492185%90–91, 224, 228, 230, 262, 337, 352, 367, 382, 394–396, 408–410, 412–413, 415–416, 420
   _runner.py170298%272–273
   _runner_context.py1481192%63, 77–78, 80–81, 83, 369, 388, 441, 454, 458
   _workflow.py2572092%87, 267–269, 271–272, 290, 294, 322, 424, 608, 629, 685, 697, 703, 708, 728–730, 743
   _workflow_builder.py2663586%262, 532, 630, 637–638, 738, 741, 746, 748, 755, 758–762, 764, 825, 899, 902, 958, 976, 989, 1003–1010, 1012, 1015, 1017–1019
   _workflow_executor.py1783083%94, 444, 467, 469, 477–478, 483, 485, 490, 492, 545, 573–579, 583–585, 593, 598, 609, 619, 623, 629, 633, 643, 647
packages/orchestrations/agent_framework_orchestrations
   _group_chat.py2865182%174, 337, 344, 373, 384–385, 391, 396, 412, 439–444, 446, 479–482, 484, 489–493, 623, 626, 629, 632, 640, 653, 656, 669, 678, 684, 728–729, 733–734, 748–749, 751–752, 783–784, 850, 869, 877, 891, 901
   _orchestration_state.py27581%22, 30, 69, 84–85
TOTAL16327197487% 

Python Unit Test Overview

Tests Skipped Failures Errors Time
3973 225 💤 0 ❌ 0 🔥 1m 10s ⏱️

@TaoChenOSU TaoChenOSU marked this pull request as ready for review February 9, 2026 23:04
Copilot AI review requested due to automatic review settings February 9, 2026 23:04
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Refactors Python workflow checkpointing to store live objects in WorkflowCheckpoint and defer serialization to storage backends (with a new pickle+base64 encoding strategy), while updating the runner/workflow APIs, samples, DevUI, and tests to the new checkpoint format and storage interfaces.

Changes:

  • Redesign checkpoint payloads: replace workflow_id with workflow_name + graph_signature_hash, add previous_checkpoint_id, and store message/event objects directly.
  • Update checkpoint storage APIs (save/load/delete/get_latest/list_*) and switch file persistence to JSON wrappers containing pickled payloads.
  • Update workflow/runner checkpoint handling and adjust samples/tests/telemetry to the new semantics.

Reviewed changes

Copilot reviewed 43 out of 44 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
python/samples/getting_started/workflows/checkpoint/workflow_as_agent_checkpoint.py Update checkpoint listing to use workflow.name.
python/samples/getting_started/workflows/checkpoint/sub_workflow_checkpoint.py Update checkpoint listing to use workflow.name.
python/samples/getting_started/workflows/checkpoint/handoff_with_tool_approval_checkpoint_resume.py Remove old sample (moved/replaced).
python/samples/getting_started/workflows/checkpoint/checkpoint_with_resume.py Update checkpoint listing to use workflow.name.
python/samples/getting_started/workflows/checkpoint/checkpoint_with_human_in_the_loop.py Remove checkpoint summary usage; update listing to workflow.name.
python/samples/getting_started/orchestrations/magentic_checkpoint.py Update checkpoint listing to use workflow.name / workflow_name.
python/samples/getting_started/orchestrations/handoff_with_tool_approval_checkpoint_resume.py New orchestrations sample demonstrating resume with approvals using new APIs.
python/packages/orchestrations/tests/test_sequential.py Update checkpoint API usage and selection logic for resume tests.
python/packages/orchestrations/tests/test_magentic.py Update checkpoint API usage and selection logic; adjust load/delete API calls.
python/packages/orchestrations/tests/test_handoff.py Update checkpoint listing to use workflow.name.
python/packages/orchestrations/tests/test_group_chat.py Update checkpoint listing to use workflow.name.
python/packages/orchestrations/tests/test_concurrent.py Update checkpoint API usage and selection logic for resume tests.
python/packages/orchestrations/agent_framework_orchestrations/_orchestration_state.py Change orchestration checkpoint state to store live objects directly.
python/packages/orchestrations/agent_framework_orchestrations/_group_chat.py Store live cache objects in executor checkpoint state.
python/packages/devui/agent_framework_devui/_server.py Update DevUI delete API call to storage.delete().
python/packages/devui/agent_framework_devui/_executor.py Update DevUI checkpoint listing to filter by workflow.name.
python/packages/core/tests/workflow/test_workflow_observability.py Update OTEL attribute expectations and checkpoint message assertions to object-based payloads.
python/packages/core/tests/workflow/test_workflow_agent.py Update checkpoint listing to use workflow.name.
python/packages/core/tests/workflow/test_workflow.py Update checkpoint model fields + storage API names; update exception assertions.
python/packages/core/tests/workflow/test_sub_workflow.py Update checkpoint listing to use workflow.name.
python/packages/core/tests/workflow/test_serialization.py Update expectation: workflow.name is always populated.
python/packages/core/tests/workflow/test_runner.py Update Runner ctor signature; add extensive checkpoint/restore tests.
python/packages/core/tests/workflow/test_request_info_event_rehydrate.py Rewrite tests around pickled checkpoint encoding and request_info restore behavior.
python/packages/core/tests/workflow/test_request_info_and_response.py Remove duplicated checkpoint test (moved to rehydrate suite).
python/packages/core/tests/workflow/test_checkpoint_validation.py Update checkpoint listing to use workflow.name.
python/packages/core/tests/workflow/test_checkpoint_encode.py Update encoding tests for pickle marker/type marker approach.
python/packages/core/tests/workflow/test_checkpoint_decode.py Update decode tests for pickle/type-marker verification.
python/packages/core/tests/workflow/test_checkpoint.py Major expansion of storage roundtrip tests; new API names and checkpoint fields.
python/packages/core/tests/workflow/test_agent_executor.py Update checkpoint listing and selection logic for restore test.
python/packages/core/agent_framework/observability.py Add workflow builder OTEL attributes.
python/packages/core/agent_framework/_workflows/_workflow_executor.py Store execution contexts directly; rely on workflow-level filtering for handled request_info events.
python/packages/core/agent_framework/_workflows/_workflow_builder.py Always assign a builder name (UUID if omitted); update build telemetry attributes.
python/packages/core/agent_framework/_workflows/_workflow.py Make name required; compute graph_signature_hash; filter request_info events when responses provided.
python/packages/core/agent_framework/_workflows/_runner_context.py Change checkpoint creation payloads to store live objects; update checkpoint method signatures.
python/packages/core/agent_framework/_workflows/_runner.py Pass workflow_name/graph hash into checkpoints; add previous-checkpoint chaining; remove legacy state hooks.
python/packages/core/agent_framework/_workflows/_events.py Stop encoding/decoding request_info data in to_dict/from_dict (store live objects).
python/packages/core/agent_framework/_workflows/_conversation_state.py Remove legacy chat message encode/decode helpers.
python/packages/core/agent_framework/_workflows/_checkpoint_summary.py Remove checkpoint summary helper.
python/packages/core/agent_framework/_workflows/_checkpoint_encoding.py Replace custom JSON encoding with pickle+base64 marker strategy + type verification.
python/packages/core/agent_framework/_workflows/_checkpoint.py Redesign checkpoint schema + storage protocol; implement in-memory and file storage with new encoding.
python/packages/core/agent_framework/_workflows/_agent_executor.py Store live conversation/cache + pending request structures in checkpoint state.
python/packages/core/agent_framework/_workflows/init.py Remove checkpoint summary exports; keep updated checkpoint exports.
python/.cspell.json Add checkpoint-related words.
Comments suppressed due to low confidence (1)

python/packages/core/agent_framework/_workflows/_runner_context.py:230

  • RunnerContext.load_checkpoint() is declared (and documented) as returning WorkflowCheckpoint | None, but InProcRunnerContext.load_checkpoint() now returns a non-optional checkpoint and relies on storage raising when missing. This is an API/typing mismatch that will confuse callers and forces redundant None checks. Align the protocol + docs with the new behavior (raise WorkflowCheckpointException / return non-optional), and adjust call sites accordingly.
    async def load_checkpoint(self, checkpoint_id: CheckpointID) -> WorkflowCheckpoint | None:
        """Load a checkpoint without mutating the current context state.

        Args:
            checkpoint_id: The ID of the checkpoint to load.

        Returns:
            The loaded checkpoint, or None if it does not exist.
        """

@TaoChenOSU TaoChenOSU changed the title [BREAKING] Python: WIP: Checkpoint refactor: encode/decode, checkpoint format, etc [BREAKING] Python: Checkpoint refactor: encode/decode, checkpoint format, etc Feb 10, 2026
@TaoChenOSU TaoChenOSU moved this to In Review in Agent Framework Feb 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

python workflows Related to Workflows in agent-framework

Projects

Status: In Review

Development

Successfully merging this pull request may close these issues.

Python: Simplify checkpoint encoding - evaluate pickle vs custom JSON encoding

2 participants